A linguistic and prosodic database for data-driven Japanese TTS synthesis
نویسندگان
چکیده
We propose a method to generate a database that contains a parametric representation of F0 contours associated with linguistic and acoustic information, to be used by data-driven Japanese text-to-speech (TTS) systems. The configuration of the database includes recorded speech, F0 contours and their parametric labels, phonetic transcription with durations, and other linguistic information such as orthographic transcription, part-of-speech (POS) tags, and accent types. All information that is not available by dictionary lookup is obtained automatically. In this paper, we propose a method to automatically obtain parametric labels that describe F0 contours based on a superpositional model. Preliminary tests on a small data set show that the method can find the parametric representation of F0 contours with acceptable accuracy, and that accuracy can be improved by introducing additional linguistic information.
منابع مشابه
Template-driven generation of prosodic information for Chinese concatenative synthesis
In this paper, a template-driven generation of prosodic information is proposed for Chinese text-to-speech conversion. A set of monosyllable-based synthesis units is selected from a large continuous speech database. The speech database is employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word...
متن کاملطراحی و ارزیابی یک مدل بازسازی گفتار به روش همگذاری واحدهای حساس به بافت نوایی
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...
متن کاملPerceptual Evaluation of Quality Deterioration Owing to Prosody Modification
Our reasearch goal is to construct a Japanese TTS (Text-to-Speech) system that can output various kinds of prosody. Since such synthetic speech is useful for a practical use, many TTS systems have implemented global prosodic control processing. But fundamentally they're designed to output speech with standard pitch and speech rate. We discuss synthesis method for high quality speech with extrem...
متن کاملStatistical Methods in Data-driven Modeling of Spanish Prosody for Text to Speech
1 In [1], we proposed an automatic data-driven methodology to model both fundamental frequency and segmental duration in TTS converters from a monospeaker recorded corpus. Therefore, it had the advantage that could be adapted to a specific corpus or a particular speaker. The main disadvantage was the size of the obtained prosodic database. In this paper, we propose to use some statistical metho...
متن کاملStatistical methods in data-driven modeling of Spanish prosody for text to speech
1 In [1], we proposed an automatic data-driven methodology to model both fundamental frequency and segmental duration in TTS converters from a monospeaker recorded corpus. Therefore, it had the advantage that could be adapted to a specific corpus or a particular speaker. The main disadvantage was the size of the obtained prosodic database. In this paper, we propose to use some statistical metho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998